The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a text query-based method for keyword spotting from online Chinese handwritten documents. The similarity between a text word and handwriting is obtained by combining the character similiarity scores given by a character classifier. To overcome the ambiguity of character segmentation, multiple
paper, we propose the automatic keyword extraction system and Thai website categorization system which can automatically update the dictionary and categorize website in Thai. The dictionary is a collection of vector which is created from the automatic keyword extraction system. The result in term of accuracy shows that our
Text categorization is the main issue which affects search results. Moreover, most approaches suffer from the high dimensionality of feature space. To overcome this problem, the use of feature selection techniques with statistical text categorization is investigated. The methods were evaluated based on Chi-Square, Information Gain and Gain Ratio. The data used to test the system consisted of 1,510...
Focused crawling is a mean for acquiring raw big data materials from the web. This paper proposes a focused crawler for discovering Arabic poetry resources based on the Apache Nutch crawler. The crawler identifies poetry relevant resources using an SVM classifier and a list of Arabic poetry related keywords. The
This paper presents an audio keywords detection method for highlight retrieval in basketball video. The keywords contain shoes squeaking sound, speech, cheer, long whistle and short whistle, which correspond to basketball game events. After feature analysis, the Simple Excellent Feature Combination based on Pearson
metrics used in text categorization by using local and global policies. For the experiments, we use three datasets which vary in size, complexity and skewness. We use SVM as the classifier and tf-idf weighting for term weighting. We observed that almost in all metrics, local policy outperforms when the number of keywords is
In this paper, we propose a novel multi-label image annotation for image retrieval based on annotated keywords. For multi-label image annotation, a bi-coded genetic algorithm is employed to select optimal feature subsets and corresponding optimal weights for every one vs. one SVM classifiers. After an unlabelled image
In this paper we propose an approach for Chinese question analysis and answer extraction. A general question analysis process contains keyword extraction and question classification. Question classification plays a crucial role in automatic question answering. To implement the question classification, we have carried
To exploit co-occurrence patterns among features and target semantics while keeping the simplicity of the keyword-based visual search, a novel reranking methods is proposed. The approach, ordinal reranking, reranks an initial search list by utilizing the co-occurrence patterns via the ranking functions such as ListNet
Domain — specific search focuses on one area of knowledge. Applying broad based ranking algorithms to vertical search domains is not desirable. The broad based ranking model builds upon the data from multiple domains existing on the web. Vertical search engines attempt to use a focused crawler that index only relevant web pages to a predefined topic. With Ranking Adaptation Model, one can adapt an...
Users of search engines interact with the system using different size and type of queries. Current search engines perform well with keyword queries but are not for verbose queries which are too long, detailed, or are expressed in more words than are needed. The detection of verbose queries may help search engines to
of HTML page, and the proposed algorithms is performed. Complete evaluation is performed which indicates the effectiveness of using our technique. The experimental results show improved precision and recall with the proposed algorithms with respect to keyword-based search. The algorithms are implemented in JAVA and its
title, keyword and link text information to represent the website. Heterogeneous classifiers are then built based on these different features. We propose a principled ensemble classification algorithm to combine the predicted results from different phishing detection classifiers. Hierarchical clustering technique has been
This paper presents a novel method to extract Protein-Protein Interaction (PPI) information from biomedical literatures based on Support Vector Machine (SVM) and K Nearest Neighbors (KNN). The two protein names, words between two proteins, words surrounding two proteins, keyword between or among the surrounding words
We study the problem of learning to rank images for image retrieval. For a noisy set of images indexed or tagged by the same keyword, we learn a ranking model from some training examples and then use the learned model to rank new images. Unlike previous work on image retrieval, which usually coarsely divide the images
This article proposes such a question classification approach that integrates multiple semantic features. It is aimed at these two questions in Chinese question classification models: inaccurate semantic information extraction and too slow processing speed caused by too high Eigenvector dimension. With the help of HowNet and the support vector machine and syntactic and semantic information of question...
better service quality. This study aims to measure GO-JEK and Grab customer satisfaction through sentiment analysis of Twitter's data. Both companies use Twitter to reach their customers and promote their service. We collect 126,405 tweets from February to March 2016 containing GO-JEK and Grab keywords. Then, we pre-process
Multi-label image annotation has received significant attention in the research community over the past few years. Multi-label automatic image annotation assigns keywords to the image based on low level features automatically. In this paper, we present an extensive survey on the research work carried out in the area
Today location technologies are integrated into many devices enabling location-based services. Movement data recorded with these devices can be uploaded to web sites and shared with others. Movement data can be organized using keywords and semantic tags, e.g. walking and running. Our main goal is to automatically
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.